Page 1 of 2
1
2
LastLast
  1. #1

    Post Defining a threshold for relevance of statistical findings

    As an exercise during my internship I have conducted a survey questionnaire on a population of 40 employees to test out which initiatives are considered useful, and which are unimportant to them. This means, the only relevant data to me was obtained through a descriptive analysis of all findings. By analyzing all means I was able to find out how important certain areas were to the employees.

    But I didn't just ask if they were important to them personally, I also wanted to know how the company was doing in the same areas. In other words, I also asked for the perceived participation of the company in area X. This way, I wanted to figure out if there are initiatives where the company is doing very little or nothing but which are considered important by the whole population.

    However, I have a limit on how many initiatives I can propose: a maximum of 3

    So I have "designed" a threshold. I have said an initiative is considered relevant if:

    1) More than 20% of the respondents noted the participation as 2 or lower, and
    2) Less than 10% of the respondents noted that this issue is unimportant for them personally.

    Leaving me with the 6 most crucial initiatives. Since each one of them comes from an individual sub-topic I can now take one initiative from each topic, leaving me with 3 initiatives.

    Great, but: I cannot base my threshold on any literature. It's just something arbitrary I made up in my mind because it sounded nice. This won't fly.
    So my question is: Is there any literature where I can read up on how to define such a threshold professionally? Because I just could not find a damn thing online or in my books.

    Whenever I search for "significance" or "threshold" I end up at an article about the p-value, which is not what I am looking for. I am not looking for any correlations between two items here. I just need a threshold value to omit unwanted results.

    I know a lot of people here are good at this, so my hope is that one of them reads this :P

    Please push me into the right direction

    - - - Updated - - -

    nobody?

    Just saying that there is no apparent literature would already be enough

  2. #2
    Deleted
    I dont get what your problem is if you can only take 3 initiatives you present the 3 which they felt the strongest about if you cant cross reference with profit share or other economic considerations like the cost of the iniative.

    The top three initiatives which are most perceived as necessary of all sub sets isnt that a simple ranking?
    The limit of three was artificially imposed so thats you threshold of what is relevant and whats not.

    Its a matter of presantation space not necesssary one of relevance all data is relevant but you can only present a limited amount in a limited timeframe to keep it conclusive.
    Last edited by mmocd79acbf389; 2015-01-11 at 12:24 AM.

  3. #3
    Quote Originally Posted by Davillage View Post
    I dont get what your problem is if you can only take 3 initiatives you present the 3 which they felt the strongest about if you cant cross reference with profit share or other economic considerations like the cost of the iniative.

    The top three initiatives which are most perceived as necessary of all sub sets isnt that a simple ranking?
    The limit of three was artificially imposed so thats you threshold of what is relevant and whats not.

    Its a matter of presantation space not necesssary one of relevance all data is relevant but you can only present a limited amount in a limited timeframe to keep it conclusive.
    The problem is my data set of two different measurements. I don't only measure which initiative the company is participating in, but also which one is perceived as useful. However, the one that has the lowest participation is not the one that is considered to be the most useful etc.

    Which is why I have to have two thresholds and only if both are true, I recommend the initiative.

    And I decided totally arbitrary to omit all which are considered to be useless by more than 10% of the population. Is it really okay this way?

  4. #4
    Deleted
    Cant you make an accumulated score of both?

    a*b or a+b

    Its more abstract but less arbitrary that way.
    Last edited by mmocd79acbf389; 2015-01-11 at 01:11 AM.

  5. #5
    "a" needs to be a high as possible, while "b" needs to be as low as possible

    I don't think the accumulated value would help me here?

  6. #6
    Deleted
    If you can say which has more weight you could make a score system anyway factoring that in.

    Depending if it should be just linear you can add both rankings.
    Top ranking b is the lowest top ranking a is the highest.

    Just a matter of sorting no higher math involved.

    Maybe this helps to look something up and dig deeper.
    http://stackoverflow.com/questions/8...ighted-sorting
    Last edited by mmocd79acbf389; 2015-01-11 at 01:29 AM.

  7. #7
    Quote Originally Posted by Davillage View Post
    If you can say which has more weight you could make a score system anyway factoring that in.

    Depending if it should be just linear you can add both rankings.

    Top ranking b is the lowest top ranking a is the highest.
    Unfortunately I can't say which has more weight. They are all equally important. Both 'a' and 'b' have to be treated as the same, as well as all initiatives. After all, even though the company doesn't participate in an initiative, if the population considers it useless to begin with, it is not worth investing in it.

    O, why did they make me do such a shitty assignment. I loved my correlation analysis and was so happy that I was able to work with it. Now suddenly this crap.


    You know what, I'll just post the table.



    And now I have to pick the three least participated in, but most important ones. This means a high value for participation, while at the same time a low value for importance.

  8. #8
    Why not have usefulness/participation as a result?

    A 10/10:1 high usefulness high participation
    B 10/1:10 high usefulness low participation
    C 1/1:1 low usefulness low participation

    would give B>A=C. That way you should get the ranking for currently not participated but most useful.

    #Edit: looking through your chart.

  9. #9
    Deleted
    1.1 15,4+5,3 =20,7
    2.1 17,9+5,3 =23,2
    2.2 15,8+5,3 =21,1
    2.3 33,3+5,3 =38,6

    Resolving grievances has the most weight if both factors are weighted equally in this list.

    Responsible political involvemend is 1 skipping the list by far given the above weight

    The ratio of banur is better.
    Last edited by mmocd79acbf389; 2015-01-11 at 01:51 AM.

  10. #10
    Quote Originally Posted by Davillage View Post
    1.1 15,4+5,3 =20,7
    2.1 17,9+5,3 =23,2
    2.2 15,8+5,3 =21,1
    2.3 33,3+5,3 =38,6

    Resolving grievances has the most weight if both factors are weighted equally in this list.

    Responsible political involvemend is 1 skipping the list by far give the above weight

    The ratio of banur is better.
    That's an issue, for example, I have with Responsible political involvement. The 19% unimportance is just too high to give it any consideration, seeing as the population sees a much greater desire for other initiatives.

    Also, assuming this ratio, if we count Civil and political rights, it gives 28,2+2,6=30,8
    if we now count employment and employment relationships, we get 28,2+5,3 = 33,5

    This would indicate that the latter is more important but it isn't, since the 5,3 is better the lower it is
    Last edited by StayTuned; 2015-01-11 at 01:54 AM.

  11. #11
    Deleted
    Quote Originally Posted by StayTuned View Post
    That's an issue, for example, I have with Responsible political involvement. The 19% unimportance is just too high to give it any consideration, seeing as the population sees a much greater desire for other initiatives.
    64,9/19,4 ~3,35

    But it would value the zeros in 'b' very very high xD
    a would become unimportant that way for the outcome.
    Last edited by mmocd79acbf389; 2015-01-11 at 01:56 AM.

  12. #12
    Quote Originally Posted by Davillage View Post
    64,9/19,4 ~3,35

    But it would value the zeros very very high xD
    a would become unimportant that way for the outcome.
    Exactly, which is also not something I can do because sometimes a 0, which means it is really important, is already really high in actual participation. I can't recommend something they are already doing

  13. #13
    Deleted
    Quote Originally Posted by StayTuned View Post
    Exactly, which is also not something I can do because sometimes a 0, which means it is really important, is already really high in actual participation. I can't recommend something they are already doing
    You got only 5 different values for b?

  14. #14
    There is a whole research area in computer science about multiple-criteria decision analysis (http://en.wikipedia.org/wiki/Multipl...ision_analysis), which is basically what you're doing. There are some things you can do.

    What I would recommend you do first is sort them by one of the criteria, and if they're the same sort it by the other so that the best in the first criteria is at the top, and if there is a tie in the first criteria, the best of the second criteria is higher. Now, keep the first in the list and, from top to bottom, eliminate any entry that is not better in it's second criterion than the the last one you kept. The result is that you eliminated any items that are objectively worse than some other item in your list, and what you're left with is called a pareto front.

    Once you've done this you're left with a set and you can't objectively say that any one is better than any other anymore. With this you can do two things. You should look at the trade-offs. You already sorted them so you can see if you go from one item in the list to the next, you will see that sometimes you get a little worse in one criterion and a lot better in the other, or you get a lot worse in the first and only a tiny bit better in the other. ditch bad trade-offs and keep the better ones.


    You can also come up with some formula like davillage said, though using some weights v and w and doing v*a + w*b (use a negative weight for one if one criterion is to be minimized and the other to be maximized).

    Either way though, you're going to need some understanding of the relative importance of your criteria, and I'm afraid I can't help you with that.

    NOTE: after you pick one you should start over with all items back in the mix. One of the ones you eliminated may now be the new best.
    I don't think this matters nearly as much as you think it does.

  15. #15
    Quote Originally Posted by Davillage View Post
    You got only 5 different values for b?
    The table is a bit longer, there are some extra values. Not too many though, the population is rather low and a lot of them have answers above 2, but only a few of the people actually voted 2 or less, resulting in what you see.

    - - - Updated - - -

    Quote Originally Posted by zoefschildpad View Post
    There is a whole research area in computer science about multiple-criteria decision analysis (http://en.wikipedia.org/wiki/Multipl...ision_analysis), which is basically what you're doing. There are some things you can do.

    What I would recommend you do first is sort them by one of the criteria, and if they're the same sort it by the other so that the best in the first criteria is at the top, and if there is a tie in the first criteria, the best of the second criteria is higher. Now, keep the first in the list and, from top to bottom, eliminate any entry that is not better in it's second criterion than the the last one you kept. The result is that you eliminated any items that are objectively worse than some other item in your list, and what you're left with is called a pareto front.

    Brutal, but seems to be what I need. I will read through the literature tomorrow. Thank you very much.

    Unfortunately I can't use any weights, otherwise my problem would be much easier to solve. It is because I have to treat them all equally that I have these issues coming up with a solution.

    Also, there seems to be an option for pareto in SPSS. If it works this way, my job is done.

  16. #16
    Why are you using "percentage of people who said 2 or less" rather than the average scores that people gave? seems like the number 2 is an arbitrary barrier...
    I don't think this matters nearly as much as you think it does.

  17. #17
    Deleted
    Quote Originally Posted by StayTuned View Post
    The table is a bit longer, there are some extra values. Not too many though, the population is rather low and a lot of them have answers above 2, but only a few of the people actually voted 2 or less, resulting in what you see.

    - - - Updated - - -




    Brutal, but seems to be what I need. I will read through the literature tomorrow. Thank you very much.

    Unfortunately I can't use any weights, otherwise my problem would be much easier to solve. It is because I have to treat them all equally that I have these issues coming up with a solution.

    Also, there seems to be an option for pareto in SPSS. If it works this way, my job is done.
    You dont need weights that makes it allot easyer.

    Sort both and give them a ranking where they land. Then add both values done.
    Last edited by mmocd79acbf389; 2015-01-11 at 02:18 AM.

  18. #18
    Quote Originally Posted by zoefschildpad View Post
    Why are you using "percentage of people who said 2 or less" rather than the average scores that people gave? seems like the number 2 is an arbitrary barrier...
    That is just the limit of people who don't agree with the question.
    So the first question was: "How would you rate our participation in X" 2 means low participation.
    The second one was "How important do you see this issue" 2 means unimportant.
    The problem is that you rather want to combine the opposite than those two.

    The order from the table should be:
    1) Sustainable resource use
    2) Property rights
    3) Restoration and both Health and Safety
    Last edited by banur; 2015-01-11 at 02:20 AM.

  19. #19
    Quote Originally Posted by zoefschildpad View Post
    Why are you using "percentage of people who said 2 or less" rather than the average scores that people gave? seems like the number 2 is an arbitrary barrier...
    Because the whole "study" is meant to improve the retention of employees. I need to focus on critical cases, where employees express a dire dissatisfaction with the current situation.

  20. #20
    Deleted
    Quote Originally Posted by banur View Post
    The order from the table should be:
    1) Sustainable resource use
    2) Property rights
    3) Restoration and both Health and Safety
    ^Labor
    3)Protection of the Enviroment
    3)Protecting Health and safety (consumer)
    Last edited by mmocd79acbf389; 2015-01-11 at 02:29 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •