I dont know the answer but this is what AI said.
I would not assume this is purely a coding mistake, but I also would not jump straight to “sample design problem” without checking a few SIPP-specific issues.
The main thing I would check is whether you are using ERH_BMONTH as the move month in a way that creates a January/start-of-spell artifact. ERH_BMONTH is the beginning month of the residence spell, while TMOVER is the monthly mover flag. For identifying the actual month of a move, I would first try using the month of the person-month record where TMOVER is 2–7, rather than using ERH_BMONTH as the timing variable. Then compare that to ERH_BMONTH as a diagnostic.
I would also split the results by wave rather than pooling Waves 2–4. In the 2014 panel, Wave 2 covers 2014, Wave 3 covers 2015, and Wave 4 covers 2016. If the January/March/April pattern is concentrated in one wave, that points toward a wave-specific processing, recall, or weighting issue rather than a general mover definition issue.
A few diagnostics I would run:
1. Tabulate movers by month separately for Wave 2, Wave 3, and Wave 4.
2. Compare three definitions:
* TMOVER = 2–7 by the monthly record month;
* change in ERESIDENCEID from month t to t+1;
* ERH_BMONTH for the residence spell.
3. Check whether the January spike is mostly returning sample members, new respondents, imputed cases, or Type 2/part-year residents.
4. Check whether the spike survives using unweighted counts, person weights, and normalized within-wave weights.
5. Check whether ERH_BMONTH = 1 is picking up people whose spell began before the reference year or at the first observed month, rather than people who truly moved in January.
6. Drop January as a sensitivity test and see whether the remaining seasonal pattern becomes plausible.
My guess is that the issue is related to the redesigned 2014 SIPP’s annual reference period and residence-spell construction, not a change in the formal definition of TMOVER. The 2014 panel moved to a one-year reference period, and that redesign created known concerns about recall, transitions, and panel attrition. So identical variable names and labels do not necessarily mean the month-of-move distribution is comparable to later SIPP files without additional checks.
The key distinction is this:
TMOVER asks whether the respondent moved in a given month.
ERH_BMONTH identifies the beginning month of the residence spell.
Those are usually consistent, but they are not conceptually identical. If you are trying to produce a seasonal distribution of moves, I would treat TMOVER by person-month as the primary measure and ERH_BMONTH as a spell-level consistency check.
So I would not say “the 2014 data are unusable.” I would say: first rule out a spell-start artifact, then rule out pooling/weighting across waves, then check whether the remaining pattern is a real 2014-panel redesign artifact.