0 votes
asked ago in General Economics Questions by (120 points)
Hi everyone, I’m working with SIPP to look at people who move during the year. According to the Census definitions, if TMOVER is between 2 and 7 the person is a mover, and the month they start their new residence is given by ERH_BMONTH.

Using these variables, I get a sensible hump-shaped seasonal pattern in most years. Fewer moves in winter, more in summer. It looks good for 2018 through 2024.

But when I use the same approach on the 2014 SIPP, Waves 2–4, the distribution looks strange: huge spikes in January (~20%), really heavy numbers in March/April, and very low values in autumn. It makes no sense.

Since TMOVER and ERH_BMONTH have the same definitions in those waves, it’s not a coding or variable issue. Could this be a sample design problem with those panels? Has anyone else seen this before or figured out what’s going on? Maybe I am missing something straightforward.

1 Answer

0 votes
answered ago by (440 points)
I dont know the answer but this is what AI said.

I would not assume this is purely a coding mistake, but I also would not jump straight to “sample design problem” without checking a few SIPP-specific issues.

The main thing I would check is whether you are using ERH_BMONTH as the move month in a way that creates a January/start-of-spell artifact. ERH_BMONTH is the beginning month of the residence spell, while TMOVER is the monthly mover flag. For identifying the actual month of a move, I would first try using the month of the person-month record where TMOVER is 2–7, rather than using ERH_BMONTH as the timing variable. Then compare that to ERH_BMONTH as a diagnostic.

I would also split the results by wave rather than pooling Waves 2–4. In the 2014 panel, Wave 2 covers 2014, Wave 3 covers 2015, and Wave 4 covers 2016. If the January/March/April pattern is concentrated in one wave, that points toward a wave-specific processing, recall, or weighting issue rather than a general mover definition issue.

A few diagnostics I would run:

1. Tabulate movers by month separately for Wave 2, Wave 3, and Wave 4.

2. Compare three definitions:

   * TMOVER = 2–7 by the monthly record month;
   * change in ERESIDENCEID from month t to t+1;
   * ERH_BMONTH for the residence spell.

3. Check whether the January spike is mostly returning sample members, new respondents, imputed cases, or Type 2/part-year residents.

4. Check whether the spike survives using unweighted counts, person weights, and normalized within-wave weights.

5. Check whether ERH_BMONTH = 1 is picking up people whose spell began before the reference year or at the first observed month, rather than people who truly moved in January.

6. Drop January as a sensitivity test and see whether the remaining seasonal pattern becomes plausible.

My guess is that the issue is related to the redesigned 2014 SIPP’s annual reference period and residence-spell construction, not a change in the formal definition of TMOVER. The 2014 panel moved to a one-year reference period, and that redesign created known concerns about recall, transitions, and panel attrition. So identical variable names and labels do not necessarily mean the month-of-move distribution is comparable to later SIPP files without additional checks.

The key distinction is this:

TMOVER asks whether the respondent moved in a given month.

ERH_BMONTH identifies the beginning month of the residence spell.

Those are usually consistent, but they are not conceptually identical. If you are trying to produce a seasonal distribution of moves, I would treat TMOVER by person-month as the primary measure and ERH_BMONTH as a spell-level consistency check.

So I would not say “the 2014 data are unusable.” I would say: first rule out a spell-start artifact, then rule out pooling/weighting across waves, then check whether the remaining pattern is a real 2014-panel redesign artifact.
...