Guest post: Anuska Maity on the research process and lessons on plotting and writing

From Running Analyses to Doing Research: Lessons from My First Paper

Anuska Maity

Introduction

When I entered IIIT Hyderabad, I knew that I wanted to pursue research. I believed that even small contributions could have lasting impact, but my understanding of what research involved was limited. I assumed it primarily meant reading prior work, proposing something novel, and validating ideas through analyses.

This post reflects on how that understanding evolved during the process of working on my first research paper. In particular, I discuss how I learned that research is not only about running analyses or achieving statistically significant results, but about carefully testing assumptions, diagnosing failures, curating data responsibly, and using visualization as a tool for reasoning and debugging.

I hope this reflection is useful for students who are just beginning their research journeys.


Phase 1: Learning the Task Without Understanding the Process

My initial months in the lab were largely devoted to reading papers. I focused on understanding what memorability meant, how it differed from memory more broadly, and whether cue memorability was a meaningful construct given prior work on target memorability.

At this stage, I treated published claims as largely unquestionable. Results presented in papers felt inherently reliable. My reading strategy was systematic but superficial: abstract first, conclusions next, results if necessary, and the full paper only when deeper understanding or replication was required. While this approach helped me navigate the literature efficiently, I had not yet learned to critically evaluate assumptions or identify gaps in reasoning.

Early analyses appeared promising, and we initially aimed for a conference submission. However, it soon became clear that the work was not yet sufficiently mature, and around the same time I transitioned into the MS by Research program.


Phase 2: Executing Analyses Versus Owning the Research Narrative

As the project progressed, a pattern emerged. I was effective at implementing analyses suggested by my advisor, reporting results, and iterating based on feedback. However, I struggled to independently drive the project.

Specifically, I found it challenging to:

  • Integrate individual analyses into a coherent research narrative

  • Generate new hypotheses in response to unexpected results

  • Reconceptualize the problem when progress stalled

Over time, the distinction between executing analyses and owning the scientific process became increasingly clear.


Phase 3: Dataset Revision and Non-Replication

A major turning point occurred when we re-examined our dataset and identified issues in the original curation process. Addressing these flaws required replacing the dataset entirely, despite having already obtained a large number of results.

This raised a critical question: Would our findings replicate?

While some results remained stable, others disappeared entirely. This experience highlighted an important lesson: results can appear convincing while being driven by data artifacts. Only after working with a more carefully curated dataset was I able to distinguish robust effects from data-dependent ones.

This fundamentally reshaped how I interpret empirical findings.


Phase 4: Debugging Models Through Visualization

As analyses were finalized, additional issues emerged—particularly related to embedding choices. Initially, pretrained embeddings seemed like neutral components of the modeling pipeline. However, visualizing predicted memorability scores revealed unexpected clustering and artificially sharp peaks.

These diagnostic visualizations prompted deeper investigation, leading to several insights:

  • Different pretrained embeddings encode different inductive biases

  • FastText embeddings trained on different corpora (e.g., Wikipedia vs. Common Crawl) can yield meaningfully different results

  • More complex or “advanced” models are not necessarily better for cognitive interpretability

In our case, FastText’s subword modeling produced embeddings for all out-of-vocabulary words, but many of these embeddings were highly similar. This reduced interpretability for our cognitive task. Switching to Word2Vec, while leaving some words without embeddings, resulted in more meaningful representations.

This reinforced the value of visualization as a diagnostic and reasoning tool, rather than merely a post-hoc reporting mechanism.






Figure 1.
Comparison of predicted memorability score distributions. The left panel shows artificially steep peaks, indicating that many words received nearly identical predictions. The right panel shows a broader distribution with greater variance after correcting the modeling pipeline. Such distributional visualizations can reveal modeling artifacts that are not evident from summary statistics alone.


Phase 5: Writing as a Methodological Constraint

One of the most unexpected lessons emerged during the writing process itself. Writing consistently exposed gaps in evidence, unclear assumptions, and weak justifications that were not apparent during analysis. Decisions about wording, structure, and citation repeatedly forced us to reassess whether claims were adequately supported by data.

This made it clear that writing is not merely a presentation step but a methodological constraint. In particular, adhering to a Context–Content–Conclusion (CCC) structure required that each claim be properly situated within prior work, supported by concrete analysis, and clearly concluded. When this structure could not be maintained, it often indicated the need for additional analyses, revised modeling choices, or more cautious interpretation.

As a result, clarity in writing directly influenced experimental and modeling decisions, leading to stronger and more defensible conclusions.


Practical Lessons

Several practical lessons emerged from this process:

  • Examine data distributions before trusting model outputs

  • Choose models based on task constraints rather than novelty

  • Use nested cross-validation when working with small datasets

  • Set random seeds consistently across libraries

  • Maintain well-organized, reproducible code

Iteration—both in analysis and writing—is unavoidable. Non-replication, failed experiments, and frustration are integral parts of the research process.


Conclusion

Research is not defined by smooth progress or clean results. It is defined by careful reasoning, critical evaluation, and the willingness to repeatedly revisit assumptions.

For students at the beginning of their research journeys, confusion and frustration are normal. Over time, these experiences accumulate into intuition and confidence. In retrospect, the growth becomes visible—and deeply rewarding.



Declarations

ChatGPT was used to rephrase my thoughts into a coherent blog post format.

Comments

Popular posts from this blog

A return to Indian academia

Remembering Prof. David Huron

A roundup of the year 2025