Kamal Das – Jigsaw Academy https://www.jigsawacademy.com Jigsaw Tue, 22 Nov 2022 11:19:25 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://www.jigsawacademy.com/wp-content/uploads/2021/09/cropped-favicon-1-32x32.jpg Kamal Das – Jigsaw Academy https://www.jigsawacademy.com 32 32 How to Make A Successful Comeback After A Career Break https://www.jigsawacademy.com/blogs/expert-speak/how-to-make-a-successful-comeback-after-a-career-break/ https://www.jigsawacademy.com/blogs/expert-speak/how-to-make-a-successful-comeback-after-a-career-break/#respond Mon, 02 Aug 2021 05:41:07 +0000 https://www.jigsawacademy.com/?p=196661 At a recent training for fresher hire as part of an MNC’s analytics training program, my colleague Dr. Chetana highlighted that only 10% of the hires were women. TrustRadius reported that in 2021, 72% of women in tech are outnumbered by men in business meetings by at least a 2:1 ratio. Women are less than 1/3rd […]

The post How to Make A Successful Comeback After A Career Break appeared first on Jigsaw Academy.

]]>
At a recent training for fresher hire as part of an MNC’s analytics training program, my colleague Dr. Chetana highlighted that only 10% of the hires were women. TrustRadius reported that in 2021, 72% of women in tech are outnumbered by men in business meetings by at least a 2:1 ratio. Women are less than 1/3rd of the employees in many tech companies. As per the latest data by catalyst, women are only 5.8% of the CEO pool in the S&P 500 companies in the US!

career break - woman

Women constitute a low percentage of student intake in most of our premier engineering colleges. A quick check of the top 10 engineering colleges in India based on NIRF ranking shows a similar low female participation.

 

 

Rank Engineering College Student Strength in 4-year UG Program No. of Female Students % of Female Students
1 Indian Institute of Technology Madras                        1,814                      272  15.0%
2 Indian Institute of Technology Delhi                        2,856                      345  12.1%
3 Indian Institute of Technology Bombay                        2,795                      307  11.0%
4 Indian Institute of Technology Kanpur                        3,184                      303  9.5%
5 Indian Institute of Technology Kharagpur                        2,251                      224  10.0%
6 Indian Institute of Technology Roorkee                        3,123                      321  10.3%
7 Indian Institute of Technology Guwahati                        2,470                      267  10.8%
8 Indian Institute of Technology
Hyderabad 
                          969                      186  19.2%
9 National Institute of Technology Tiruchirappalli                        2,719                      662  24.3%
10 Indian Institute of Technology Indore                           798                        88  11.0%
Top 5 Engineering colleges as per NIRF                     12,900                    1,451  11.2%
Top 10 Engineering colleges as per NIRF                     22,979                    2,975  12.9%

Data Source: NIRF

Not that it was always the case! Women were the pioneers in computing and made significant contributions to the field. History is written by the winners, or, in this case, predominantly males, and much of this contribution was left unattributed for decades. In an earlier article, I had written about why it’s critical to have more diversity in AI. Lower female participation in AI has unintended negative consequences to making the future more biased.

India has among the lowest female workforce participation in the world! According to World Bank data, a mere 20.79% of the workforce (for ages 15+) in India is female, compares to 47.29% worldwide. Alarmingly, this percentage has been falling since 2005, and initial reports by agencies including Oxfam and TrustRadius suggest the COVID-19 pandemic has further negatively impacted this trend in India.

career

Image source: World Bank

Besides low participation, women also struggle with salary disparity, often being paid 20% lower for equivalent work. This is compounded by the unequal workload distribution at home as well as the role as the primary caregiver. The challenges are more difficult for women looking to return from a career break – for multiple reasons, including marriage, maternity, spouse travel to a forging country where they did not have a work visa, taking care of elderly family members.

Over the last few months, as hiring has resumed post-COVID, it has been heartening to see many companies ask us for focused hiring of women to improve the demographic mix of employees. We think it’s a step in the right direction to have a more balanced workforce.

Returning to work has been a difficult transition for many. And, experiences with many of our female students had reinforced the message that it is unlikely to be smooth. Here are a few things that can help make the process less difficult:

1. Ask your network for help

Many people make the process of getting back to work personal and do not involve the vast networks that know and trust them. Reach out and make it known you are back in the job market. The kindness of strangers and your network often springs a pleasant surprise!

2. Don’t just apply to jobs; Converse with stakeholders 

Use professional networks like LinkedIn not just to apply to jobs but to connect with people in the prospective company. Understand the work culture, check your fit and possibly even get a job referral. Reaching out to hiring managers and recruitment personnel in your target companies helps bring you to their attention. 

3. Apply at the companies that have getting back to work webpages

Many companies have specific web pages for women returning to work. I have listed out some of them to help you get an idea. 

It often makes sense to apply through this channel to improve your chances of being interviewed.

4. Build skills and get certifications in areas of interest

Look at re-skilling yourself into an area of interest. Thanks to MOOCs and the booming EdTech sector, there are plentiful ways to gain more skills and certifications in areas of interest. Add them to your resume to make you stand out. Also, add projects, and any other work done by you, to help to distinguish you from the applicant pool.

5. Keep the faith! 

A job hunt is a lonely and often saddening experience. No reverts after multiple applications, interviews that did not go well, offers that are put on hold due to any small reason may make the process difficult and depressing. Keep the faith. Your job may be just one application away!

Overall, it is essential to have a pathway for everyone to seek meaningful employment. It is heartening to see many companies taking the step to help women return to the workforce and reinvent their careers. And it is essential to have more ladies back to work for the nation’s prosperity and make the workplace a more equitable and fairer place! 

Remember, a setback is but a setup for a great comeback. So, don’t lose hope and continue your job hunt! Good luck!

The post How to Make A Successful Comeback After A Career Break appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/blogs/expert-speak/how-to-make-a-successful-comeback-after-a-career-break/feed/ 0
AI needs Diversity to reduce Gender and Racial Bias! https://www.jigsawacademy.com/blogs/expert-speak/ai-needs-diversity-to-reduce-gender-and-racial-bias/ https://www.jigsawacademy.com/blogs/expert-speak/ai-needs-diversity-to-reduce-gender-and-racial-bias/#respond Mon, 19 Jul 2021 12:53:34 +0000 https://www.jigsawacademy.com/?p=194936 Artificial Intelligence is the new electricity, powering the technological revolution just like electricity enabled, believes Coursera co-founder Andrew. However, AI has a significant gender and racial bias. MIT discusses how computer vision is great at recognizing light-skinned males but not good at recognizing darker females. The ability of computer vision algorithms to recognize dark-skinned females is […]

The post AI needs Diversity to reduce Gender and Racial Bias! appeared first on Jigsaw Academy.

]]>
Artificial Intelligence is the new electricity, powering the technological revolution just like electricity enabled, believes Coursera co-founder Andrew. However, AI has a significant gender and racial bias.

career

MIT discusses how computer vision is great at recognizing light-skinned males but not good at recognizing darker females. The ability of computer vision algorithms to recognize dark-skinned females is 20%- 34% poorer than its ability to recognize light-skinned males.

career

Research by the University of Colorado Boulder highlights the difficulty in identifying transwomen and transmen. 

career

In the research paper, Diversity in Faces by IBM Research AI, the authors highlight that most computer vision training datasets are predominantly focused on light-skinned males. Light-skinned people constitute between 80% to 95% of the images in most training databases. The datasets are also predominantly male. Historically as well, camera manufacturers have focused on light-skinned people and paid less emphasis on capturing other skin tones appropriately. 

career

These results in computer vision algorithms inappropriately classify a throwback image of the former First Lady of the US (FLOTUS) as “a young man wearing a black shirt”! Why? A mere 2.5% of Google employees are Black, as per its 2018 report! Women are also under-represented, comprising around 20% of the workforce in big tech companies as per a report by Bloomberg.

career

 

Source: MIT, 6. SI9I. Introduction to Deep Learning

career

Like many other spheres in life, we need more diversity in AI. We need to actively promote people from diverse and under-represented backgrounds to join and share their views on the development of AI. Otherwise, needless to say, AI will remain biased in terms of culture, race and gender. And like a recent pop song, we’ll be left complaining “Tuada Kutta Tommy Sada Kutta Kutta.” 

career

The post AI needs Diversity to reduce Gender and Racial Bias! appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/blogs/expert-speak/ai-needs-diversity-to-reduce-gender-and-racial-bias/feed/ 0
Models behaving badly: Understanding the poor performance of Covid-19 projections https://www.jigsawacademy.com/models-behaving-badly-understanding-the-poor-performance-of-covid-19-projections/ https://www.jigsawacademy.com/models-behaving-badly-understanding-the-poor-performance-of-covid-19-projections/#respond Thu, 04 Jun 2020 06:18:08 +0000 https://analyticstraining.com/?p=16255 When the novel Corona virus affected human societies, many experts created projections of how the disease would spread. Most of these efforts now appear fairly poor. Understanding why they did not capture the reality is a useful step in doing a better job during such events in future. This article looks to understand why they […]

The post Models behaving badly: Understanding the poor performance of Covid-19 projections appeared first on Jigsaw Academy.

]]>
When the novel Corona virus affected human societies, many experts created projections of how the disease would spread. Most of these efforts now appear fairly poor. Understanding why they did not capture the reality is a useful step in doing a better job during such events in future. This article looks to understand why they failed to capture reality and looks to predict the spread of corona virus using various analytical tools.

https://prime.economictimes.indiatimes.com/news/76014848/pharma-and-healthcare/models-behaving-badly-understanding-the-poor-performance-of-covid-19-projections

The post Models behaving badly: Understanding the poor performance of Covid-19 projections appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/models-behaving-badly-understanding-the-poor-performance-of-covid-19-projections/feed/ 0
Visualizing geographic data using Plotly in Python https://www.jigsawacademy.com/visualizing-geographic-data-using-plotly-in-python/ https://www.jigsawacademy.com/visualizing-geographic-data-using-plotly-in-python/#respond Thu, 14 May 2020 09:26:44 +0000 https://analyticstraining.com/?p=16174 In 2019, IEEE ranked Python as the top programming language in the world (Link: https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2019). Many data scientists and business analytics professions still felt that R was better, specifically in the areas of statistical analysis and visualization. Python has a lot of libraries for visualizations including matplotlib and seaborn. Earlier, visualizing geographic data was a […]

The post Visualizing geographic data using Plotly in Python appeared first on Jigsaw Academy.

]]>
In 2019, IEEE ranked Python as the top programming language in the world (Link: https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2019). Many data scientists and business analytics professions still felt that R was better, specifically in the areas of statistical analysis and visualization.

Python has a lot of libraries for visualizations including matplotlib and seaborn. Earlier, visualizing geographic data was a challenge. Geopandas was not simple to use. With Plotly this has changed.

We share a colab file which visualizes geographic data using Plotly in Python. Link to the file:

https://colab.research.google.com/drive/1qefpxRj52tH4D380MdPVc9RG9FZjY3Ht

Please feel free to copy it to your drive and run the file that will show how COVID-19 has moved from a local health concern to a global pandemic.

We took the raw time-series data from the widely used John Hopkins repo, processed it and then using Plotly to graphically show the spread of worldwide spread of coronavirus over time. We have four charts which are automated using choropleth maps in Python with Plotly for global confirmed cases, deaths, recovered and existing cases. 

Using the play button, you can see how the cases spread over time. Using Plotly, we showcase how simple it is to build an automated geographical visualisation. 

Hope you enjoy it!

Looking to learn Python and data analytics and visualization? Try out our Postgraduate Diploma In Data Science (PGD-DS) Link https://www.jigsawacademy.com/post-graduate-diploma-in-data-science-pgds-certification-training/ and Integrated Program In Business Analytics (IPBA)  Link: https://www.jigsawacademy.com/integrated-program-in-business-analytics/ programs!

The post Visualizing geographic data using Plotly in Python appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/visualizing-geographic-data-using-plotly-in-python/feed/ 0
COVID-19: An attempt to predict Confirmed Cases in India https://www.jigsawacademy.com/covid-19-an-attempt-to-predict-confirmed-cases-in-india/ https://www.jigsawacademy.com/covid-19-an-attempt-to-predict-confirmed-cases-in-india/#respond Thu, 23 Apr 2020 10:44:21 +0000 https://analyticstraining.com/?p=16093 The COVID-19 pandemic continues to ravage the world. Even as global infections crossed 2.6 million, India’s number at around 21,370 seems modest, given we are home to one-sixth of the world’s population. Based on data from Johns Hopkins, in per capita terms, only 16 in a million people in India are infected by COVID-19, vs […]

The post COVID-19: An attempt to predict Confirmed Cases in India appeared first on Jigsaw Academy.

]]>
The COVID-19 pandemic continues to ravage the world. Even as global infections crossed 2.6 million, India’s number at around 21,370 seems modest, given we are home to one-sixth of the world’s population. Based on data from Johns Hopkins, in per capita terms, only 16 in a million people in India are infected by COVID-19, vs 338 in a million people globally (as of 22nd April 2020). Things in India are not as bad… but what does the future look like?

Given my interest in numbers and trends, I have been trying to figure out if we could forecast the trends for COVID-19. I requested data from the popular ones Johns Hopkins-CDDEP, BCG and other forecasts but these were allegedly not for public dissemination / disputed and I did not get a response. In general, I noticed that most of the forecast did not provide day-wise numbers. On a log scale without supporting numbers, it was difficult to decipher what the forecasters wanted to say from the presentations and reports. I would not have been able to read even my own forecast chart without the accompanying numbers.

careerForecasts from the US by experts compiled by fivethirtyeight showed a huge variation in forecasts. My colleague Gunnvant has created a data scraper and visualization tool for COVID-19. However, I did not find any good forecasts for India. 

Some bad ones are out there. “A five-member Central team has projected that the number of COVID-19 cases in Mumbai will touch an estimated 42,604 by April 30 and spiral to 6,56,407 by May 15. Based on mathematical modelling for Mumbai by the Union Ministry of Health on April 16

careerSource: The Hindu

The assumptions are too simplistic. 3.8 doubling maintained throughout the forecast period. Such high numbers are great for scaremongering, grabbing eyeballs and making headlines. The state government is disputing these numbers. They should. Such “mathematical modelling” have been made by team members who had no understanding of either mathematics nor modelling. These forecasts add negligible value. May I direct these ill-trained forecasters to some courses at Jigsaw Academy …

Given my absolute lack of knowledge on diseases, I was initially hesitant to try to forecast it. I take solace from the words of Mark Weir of Ohio State’s ecology, epidemiology, and population health program:

careerSource: fivethirtyeight.com

I looked at this as a data forecasting problem and decided to build a simple time series model. Having spent over a decade forecasting revenues, profits and the unknowable stock prices of my coverage universe, I was used to being wrong and forecasting things I had no idea of! Here is the result, the link to my COVID-19 confirmed infections predictions for India: https://docs.google.com/spreadsheets/d/1dc9hwCSz7hoqkgymPghar0AnN80weDgRICQ2qXrmxB0/edit?usp=sharing

When I build the models, these are the things I wanted to have:

  1. Less difference between the upper and lower bound of estimates. This is therefore not the 95% likelihood 
  2. Mean estimate that hopefully will have less than 5% error from actuals was my aim
  3. steady/sticky estimates that would update as new information came in but not be too sensitive to minor changes.

You may view the details from in the Google sheet. However, you will not be able to edit or change anything. You may copy it to your own Google drive if you would like to make any changes. All changes in forecast are recorded and ideally these will be updated once a day. 

The data is sourced from Johns Hopkins (details in the Google spreadsheet). As some of the data is country-wise and some data is state-wise (for some countries like the US, China and Australia), we use groupby in Python and download as an excel file. We use a simple time series forecasting model to predict the number of confirmed COVID-19 infections in the next seven days. We also highlight the upper bound and lower bound of the estimates. We check the difference of our mean estimate and the actual numbers. The data for my daily forecasts is available from 11th April and since then the actual number has been within 5% of the predicted forecast. Here are my forecasts for the next seven days.. 

Source: https://docs.google.com/spreadsheets/d/1dc9hwCSz7hoqkgymPghar0AnN80weDgRICQ2qXrmxB0/edit?usp=sharing

The model is work-in-progress and considering some fine tuning. The lower bound is easier to predict as it can’t be less than actuals. The upper bound needs to be tested, especially once we are not in lockdown and may increase the rate of spread. Looking forward to extending the duration of the forecast as well as seeing if we can predict the peak of the infection in India. Hope to share the model soon. 

Given these limitations, honestly, I am surprised the simple model has reasonably good predictive power. And I decided to post it on a public forum to (i) make myself update it daily (ii) see if the model continues to be as good in predicting the numbers, especially in public scrutiny! 

Note that my predictions keep changing each day as fresh data comes in. My prediction for today’s (23rd April) confirmed cases have increased by 4% over the last seven days. I am searching for the peak and to see the numbers fall. Hopefully, my numbers will prove excessive and we will see it reduce… Unfortunately, the forecasts seem to be edging up. All models are right until they go wrong! Hopefully, this falters in predicting too much, and the numbers end up being lower than forecast…

careerLet’s have a more sensible discussion on numbers and expectations. I estimated, India would be around 11,000 confirmed infections on 14th April and there would be a push to keep the lockdown intact. With cases around 20,000 currently, going to around 35,000 by 30th April and expected to cross 40,000 by 3rd May, are we looking for at least a partial lockdown continuing? We will know soon enough…    

Ok, we all agree that Mumbai hitting 6.5 lakh cases by 15th May is baloney. However, while the experts in the Union Ministry of Health  expect over 42,000 confirmed cases by 30th April, I have the audacity to suggest that the whole of India will have less than 42,000 cases by 30th April?

Yes, I do. Game on! And because I back myself, may the better forecaster win!

careerDisclaimer:

I offer my views, with the knowledge that diseases, medicine and healthcare are not my area of expertise. This is an attempt in predictive time series analysis. There are a lot of bad models out there, and I am confident this will be better than most.

Also, given that many discussions on the topic have been polarized by political leanings and viewpoints, I would like to stress that these are not to promote any ideology or offer judgment on government policy decisions.

My only wish is that the government both state and central focus on improving healthcare infrastructure and facilities in India, while they leave the forecasting to those who can!

 

The post COVID-19: An attempt to predict Confirmed Cases in India appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/covid-19-an-attempt-to-predict-confirmed-cases-in-india/feed/ 0
Model Thinking: COVID-19 – A brief discussion on Confusion Matrix https://www.jigsawacademy.com/model-thinking-covid-19-a-brief-discussion-on-confusion-matrix/ https://www.jigsawacademy.com/model-thinking-covid-19-a-brief-discussion-on-confusion-matrix/#respond Mon, 06 Apr 2020 10:50:43 +0000 https://analyticstraining.com/?p=15959 A brief discussion on Confusion Matrix – As the COVID-19 pandemic continues to ravage the world, India has done remarkably well. Even as global infections have crossed 1.2 million, India’s number at around 4,000 seem model, given we are home to one-sixth of the world’s population. In per capita terms, only 3 in a million people […]

The post Model Thinking: COVID-19 – A brief discussion on Confusion Matrix appeared first on Jigsaw Academy.

]]>
A brief discussion on Confusion Matrix – As the COVID-19 pandemic continues to ravage the world, India has done remarkably well. Even as global infections have crossed 1.2 million, India’s number at around 4,000 seem model, given we are home to one-sixth of the world’s population. In per capita terms, only 3 in a million people in India are infected by COVID-19, vs 156 in a million people globally (as of 5th April 2020).

A number of reasons have been suggested including a smaller number of tests, BCG vaccine usage in India, Indian strain being less virulent and Indians being more resistant to infections.

In this context, I seek to use the confusion matrix (also called error matrix and matching matrix in unsupervised learning) to discuss the impact of higher testing. 

Students of my Quantitative Methods class would be familiar with the chart below.  The confusion matrix tends to confuse a lot of the students. Its implications are underappreciated by many practitioners of statistics.

careerIn brief, for any test, in medicine or otherwise, the results may have some error. These are classified as:

1. True Positive: A true positive is the correct affirmation of the presence of a condition

For example, concluding a pregnant lady is pregnant would be “True Positive”

2. False Negative: A false negative is an error in which a test result improperly indicates no presence of a condition (the result is negative), when in reality, the condition is present.

For example, concluding a pregnant lady is not pregnant would be “False Negative”

3. False Positive: A false positive is the error in affirmation of the presence of a condition

For example, concluding a man is pregnant would be “False Positive”

4. True Negative: A true negative is the correct affirmation that the condition is not present 

For example, concluding a man is not pregnant would be “True Negative”

For any tests, some error will occur. Some infected people will be assumed to be uninfected (false negatives) and some uninfected people will show up being infected (false positives). What are the true positive rates (also called sensitivity) for the COVID-19 tests? We don’t know yet. 

Let’s assume (this is hypothetical and not meant to be a forecast), that: 
  • The current number of infections in India (3 infections per million population) , the infection rate in India is 10 people per million population (or 13,000 actually vs 4,000 reported)
  • The tests are very accurate and the sensitivity is 99.9% (or 99.9% of the infections are correctly reported)
  • There are some chances of true negatives, 0.01% (or 1 in 10,000 not infected will be shown to have  an infection). This implies a true negative of 99.99%. This is higher than the true positive rate of 99.9%

Now lets assume India has the money, resources, time and effort to get everyone in India tested. Yes, all 1.3 billion Indians. What would that mean? Think about this before you peep into the solution. 

We assume 10 in a million or 13,000 Indians are infected.  This is the true number of infections:

1. True Positive:

99.9% will be correctly diagnosed or 12, 987 infected Indians will be confirmed to have been infected 

2. False Negative: 

0.1%  (or 13 Indians) will be incorrectly diagnosed as being uninfected while they are infected

3. False Positive: A false positive is the error in affirmation of the presence of a condition

1,29,99,87,000 are uninfected. However, the false positive rate is 0.01%. As a result, 129,999 people who do not have the infection would have been diagnosed as being infected. This is around 10 times the actual number of infected. As a result, if the medical test leads to even small false positives, excessive testing may overwhelm the medical system and make it difficult for the actual patents to get correct treatment.

4. True Negative: A true negative is the correct affirmation that the condition is not present 

99.99% of the 1,29,99,87,000 uninfected (or 1,29,98,57,001) will be true negatives.

However, we realize how excessive testing and even small false positives can impact us in ways we do not realize. Currently, 3% of COVID-19 tests lead to confirmation of infections. We are testing 33x more than actual infections which is better than most nations and in line with South Korea, among the best in testing its citizens. India is already testing 97% of the people who end up not being infected. We may infer that we are testing more likely and possible cases than most western nations already.

careerAre we testing enough? Should we test more? I am not a medical practitioner, and will let more capable minds decide on the testing rates.

What do you think? Would love to hear your thoughts and comments.

Disclaimer: I offer my views, with the knowledge that medicine, and health is not my area of expertise. Also, given that many discussions on the topic have been polarized by political leanings and viewpoints, I would like to stress that these are not to promote any ideology.

The post Model Thinking: COVID-19 – A brief discussion on Confusion Matrix appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/model-thinking-covid-19-a-brief-discussion-on-confusion-matrix/feed/ 0
COVID-19: Demographic Analysis https://www.jigsawacademy.com/covid-19-demographic-analysis/ https://www.jigsawacademy.com/covid-19-demographic-analysis/#respond Wed, 08 Apr 2020 07:16:48 +0000 https://analyticstraining.com/?p=15975 The COVID-19 pandemic is the biggest outbreak for our generation. As of 6th April 2020, over 1.3 million peoples have been infected and almost 75,000 died due to COVID-19. The US has become the epicenter with over 25% of the total confirmed infected cases. (Click here for source link) On the evening of 6th April, […]

The post COVID-19: Demographic Analysis appeared first on Jigsaw Academy.

]]>
The COVID-19 pandemic is the biggest outbreak for our generation. As of 6th April 2020, over 1.3 million peoples have been infected and almost 75,000 died due to COVID-19. The US has become the epicenter with over 25% of the total confirmed infected cases. (Click here for source link)

On the evening of 6th April, 2020, the Ministry of Health and Family Welfare, the Government of India shared demographic details of the infections of COVID-19 and deaths due to the disease in India. We use this to analyse some demographic trends in India. The analysis below uses data from the press release, India’s census data and Johns Hopkins for global data (Click on for the source link).

As per Johns Hopkins, 184 nationalities have COVID-19 cases as on 6th April 2020. With 4,778 infected people, India has the ranks 27 in terms of number of confirmed cases. The lock down and proactive action seem to have limited the growth of infections in India.

 

careerIndia’s death count due to COVID-19 has crossed 100. Mortality rate calculated as the number of deaths divided by the number of confirmed cases is around 2.8% for India around half of the global mortality rate of 5.5% for COVID-19.careerAfter the most recent press release from India’s Ministry of Health and Family Welfare, we are able to analyse some data in terms of gender and age groups for India.careerIn India, 76% of the 4,067 infections were for males, and 24% were for females. Given that males are more mobile and represent a larger working population, we expected a larger male population. However, as male are 51.5% of India’s population, the fact that they are 76% seems high.

As per census data, 5 in every million Indian males has been infected by COVID-19 but only 1.7 in every million Indian females. Indian females take more precautions, are tested less or are less immune to COVID-19?

In case of mortality rate, Indian men are better off. Mortality rate is 2.6% of Indian men and 3.0% for Indian women vs national average of 2.7%. Take care ladies!

In terms of age, India’s Ministry of Health and Family Welfare has grouped Indian’s to below 40 years, between 40-60 years and above 60 years.careerEveryone can get infected. 47% of the infections are for Indians below the age of 40 years. Alarmingly, while only 19% of the Indians over 60 years of age, get infected they account for 63% of the deaths! The mortality rate of  those above 60 years is 8.9% vs 2.4% for those between 40-60 years and 0.4% for those below 40 years.careerIn other words, 1 in 250 people infected below the age of 40 years in India die because of COVID-19. This increases to 1 in 42 infections for Indians between the age of 40-60 years, and jumps to 1 in 11 for Indians above the age of 60 years.

This appears to be in line with global trends as Kiran Mazumdar Shaw, Executive Chairperson of Biocon Limited, has highlighted.career

 

India is a young country, with the median age of  28 years. As a result, the fact that less than 50% of the infections have happened for those below 40 years indicates another disturbing fact. 

We used 2011 census data and found the number of infections by age group in India. We discover that while 2.2 people per million Indians in the age group below 40 years get infected by COVID-19, the rate increases to 6.4 people per million Indians in the age group between  40 to 60 years, and 7.4 people per million Indians above the age group 60 years. Not only are the elderly more likely to die because of COVID-19, they are more likely to get infected by COVID-19.careerThose above 60 years need to be especially careful. They have a higher probability of getting infected and higher mortality rates. The need to stay home, practice social distancing and take precautionary measures are paramount.  Even the younger population needs to take care to avoid parents, and elderly relatives from being infected.

Stay home, stay safe!

The post COVID-19: Demographic Analysis appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/covid-19-demographic-analysis/feed/ 0
Who is as cheery as Santa Claus? India’s Finance Minister! https://www.jigsawacademy.com/who-is-as-cheery-as-santa-claus-indias-finance-ministers/ https://www.jigsawacademy.com/who-is-as-cheery-as-santa-claus-indias-finance-ministers/#respond Sun, 02 Feb 2020 11:26:06 +0000 https://analyticstraining.com/?p=15700 Given my background in finance, I celebrate the new year on 1st April ? and the Union Budget is as important an event as Christmas! As an NLP exercise, I decided to use the budget speeches from the last decade. What is NLP, you ask? Natural Language Processing (NLP) is an interdisciplinary branch of artificial […]

The post Who is as cheery as Santa Claus? India’s Finance Minister! appeared first on Jigsaw Academy.

]]>
Given my background in finance, I celebrate the new year on 1st April ? and the Union Budget is as important an event as Christmas!

As an NLP exercise, I decided to use the budget speeches from the last decade. What is NLP, you ask? Natural Language Processing (NLP) is an interdisciplinary branch of artificial intelligence, computer science, and linguists that helps program computers to understand, interpret, and generate native human or natural language. Do read our earlier blog post, A Quick Introduction To Natural Language Processing.

Alexa, Siri, and Google Assistant are all examples of NLP in practice. NLP has numerous applications such as part-of-speech tagging, Named Entity Recognition (NER), question-answering, speech recognition, text-to-speech and speech-to-text, topic modeling, sentiment classification, language modeling, and translation

In this article, we will focus on the sentiment analysis of the budget speeches by Indian Finance Ministers. We have had 12 budget presentations (including 2 interim ones) in the past ten years. I downloaded the data from the Government of India’s site. However, the Sarkar does not pay much attention to the details and some years have wrong/dead links!  I sourced the missing speeches from national newspapers.

I used a loop to read the data and store it as a Python pandas data frame. I used regular expressions to clean the data. Using sklearn’s CountVectorizer, I created a document-term matrix excluding common English stop words. We did our analysis on the cleaned data.

Here are the top 15 words used in a few of the budget speeches:

Feb 2010 – Pranab Mukherjee

cent, propose, crore, year, duty, government, tax, sector, development, growth, budget, provide, fiscal, central

Feb 2013 – P Chidambaram

propose, crore, percent, tax, provide, government, year, sector, investment, development, funds, fund, rate, plan

Feb 2015 – Arun Jaitley

tax, crore, proposed, India, act, service, government, excise, duty, year, investment, madam, provide, credit

Jul 2019 – Nirmala Sitharaman 

tax, government, proposed, India, provide, shall, lakh, section, scheme, crore, income, act, years, year

Looking at the list, I added some more words which we consider not relevant for the analysis. The list is  add_stop_words= [‘crore’, ‘year’, ‘propose’, ‘provide’, ‘sector’, ‘lakh’, ‘years’, ‘proposed’, ‘new’, ‘cent’, ‘percent’,  ‘shall’ ]

Then we built word clouds for the budget speeches from the last decade.

careerDo you notice any trends and patterns from the word clouds? Looking at the word clouds, what other words would you remove by adding to the add_stop_words list? Which words do you think would be among the most commonly used words in the 2020 Union Budget?

careerI did a short analysis on the vocabulary of Finance Ministers. It would have been interesting to see how Shashi Tharoor would have measured up if he was the Finance Minister, don’t you think?

We also did a sentiment analysis using the textblob library.

careerAs we can see, our finance ministers are a positive lot. As cheery as Santa Claus! Ho Ho!!

As noted in 2013 and 2018, the finance ministers tend to be more opinionated during the final full budget before the national elections. 

Finally, we analyzed the polarity for the budget speeches.

careerAre you wondering what polarity is? In brief, polarity refers to the emotions expressed in a sentence. The strength of sentiments or opinions is linked to the intensity of emotions, such as happiness and anger. It does appear that the mood dips during the end of the budget speeches.

Interested to read more about how TextBlob calculates sentiments and polarity? You can read more here

What else would you do? Use a bag of words/n-grams? Use stemming and lemmatization? Or do you side with Peter Skomoroch, the Principal Data Scientist at LinkedIn?

careerInterested in learning more about NLP? Join the Postgraduate Program in Data Science and Machine Learning (PGPDM) course, offered by Jigsaw Academy in collaboration with the University of Chicago, which has a new module on AI and DL. NLP is covered in detail.

We also cover text analysis in IIM Indore’s Integrated program in Business Analytics (IPBA) course.

The post Who is as cheery as Santa Claus? India’s Finance Minister! appeared first on Jigsaw Academy.

]]>
https://www.jigsawacademy.com/who-is-as-cheery-as-santa-claus-indias-finance-ministers/feed/ 0