The “null hypothesis” is the hypothesis in research (and statistics) that claims there is no statistically significant relationship between the experimental (dependent) variables and the observed results or data collected.
There is an assumption that the null hypothesis is true, unless research findings indicate otherwise. Rejecting the null hypothesis can be the central task of research.
The null hypothesis can be denoted in statistics as .
First, what is a dummy variable and why do we need them?
What does machine learning do with labels like “United States” when trying to figure out how to process data? These models cannot use these labels in mathematical operations. “1 + United States” does not have a result. So, these labels (commonly referred to as “categorical” variables) need to be converted to something upon which operations can occur.
Let’s make a very simple example. You are trying to use (multiple) linear regression to figure out the effects on the salaries of workers of the following variables:
- the countries in which the workers are employed
- the age of the workers
- the number of years the workers have been on the job
You have a list of salaries and you want plot the salaries and use machine learning to be able in future to estimate salary by country, age and years on the job. Salary is your dependent variable (the one you want to watch change in response to the other variable changes). The other variables are your independent variables.
With dummy variables:
As already noted, categorical variables need to be converted to numerical values. We do not want to do this in one column, as our machine learning model might think there is a difference in values between these variables. If “United States” is given a value of “1” and “Canada” is given a value of “2”, “United States” might be considered numerically more (or less, depending on our logic) significant. To resolve this issue, we create “dummy variables”, giving each variable its own column and providing a 0 or 1 (0 if “no” and 1 if “yes”). Our dataset which contains the dummy variables might look like the following:
Second, what is the trap?
Imagine that you have a dataset with the constant “1” and dummy columns for “male” and “female”. The male and female columns will add up to “1”, which is equal to the constant column. This “1” equals the constant regressor and the regression equation becomes unsolvable. The solution? Either remove the constant or one of the dummy variables. Back to our example – like the male versus female example, the country in our dataset must be either “United States” or “Canada”, so we can remove one of these to avoid the Dummy Variable Trap.
With constant and both dummy variables:
With constant and one dummy variable (United States dummy variable removed):
We have now avoided the Dummy Variable Trap in this dataset!
While working through how to add machine learning to my mental health app, I came across the course “Machine Learning A-Z™: Hands-On Python & R In Data Science” at Udemy, found at:
The course looked like exactly what I wanted, so I signed up and got another course (“Deep Learning”) included in bundle. I am about 20 videos into the ML course and I am loving it!
I have already installed the two major (and open source) IDEs (Integrated Development Environments) used in the course:
Anaconda for Python: https://www.anaconda.com/
I have also begun to work through datasets in both Python and R. Great stuff!
It is almost the end of the day here in Aotearoa New Zealand. A decade ago, I wouldn’t have given a day highlighting mental health a second thought. I was one of those persons who spent the first four decades of his life without any significant mental health concerns. Sure, I felt a bit anxious on that sixth cup of coffee and sure I had dealt with bullies in primary school, but that was it. I didn’t realise how easy I had it. I also didn’t have much empathy or compassion for the suffering of others.
Ten years ago, my world collapsed. I experienced PTSD as my marriage ended. In the depths of despair, I decided to become a counsellor to help others.
Mental Health Day is now front and centre in my thoughts. I can imagine the suffering of others, because I have felt it myself. I have also counselled others in person and from a distance. The burdens others carry can be unimaginable and when we try to understand, we tend to pull back in fear. One of the first things I learned as a counsellor was not to be a problem-solver. People in distress get more than enough of those interactions. Sure, help, but don’t feel a need to fill every second with speaking and don’t tell them “all you need to do is . . . ” That rubbish is generally unwelcome.
How can you help? Learn about mental health issues. Volunteer to just be with those suffering. Do things to make their struggles a bit easier – be it offering them a cuppa, listening without advising, making a meal, bringing their wash from the clothesline. There are so many ways to help others in need.
If we are lucky enough not to be struggling ourselves right now, we certainly will, given enough life. Help someone up, asking nothing in return. When you someday get the same, you will savour it that much more.
For those struggling – you are not alone.
The extent to which the results of a research experiment can be generalised to other contexts.
Contrast with internal validity.
The extent to which the results of a research experiment can be attributed to the independent variable under consideration, rather than to some confounding variable, through minimisation of systematic error (bias).
Contrast to external validity.
When researchers (or the public) assume a relationship exists between variables which occur together, when there is actually no correlation between the variables. The reasons for assuming a relationship between variables can be driven by personal biases, racism, “common sense” or other cognitive biases.