Correcting the Record

This year marks my fifteenth year writing this blog. Since the beginning days of this space, I’ve tried to maintain a few core principles about the things I write (and have written). While I won’t get into all of those principles in this post, the one that I’ve honored since the start is “Don’t rewrite history.” When I write a post, it captures my thoughts, emotions, and working environment at that moment. Every post represents a snapshot in time. If I revisit a post a year or two down the road, I avoid rewriting or editing the post. Sure, I may correct a spelling or grammar mistake that I may have missed earlier, but I won’t rewrite the content or change the meaning or intent of the original post. By adopting this principle, I wanted to honor my past self (and my thinking at the time) and give my future self something to look back on. Through fifteen years of blog posts, I can’t remember revising the intent or meaning of a single post.

Today, I’ve decided to revise two posts from the past. Obviously, I’m doing this after great reflection, but I believe it is warranted. Looking through a few old posts on ChatGPT, and other generative artificial intelligence (GenAI) tools, I made claims about GenAI detectors that I’d like to revise. I don’t want people coming upon one of the past posts and having it inform their practice. So, I’ve added disclaimers on a couple old posts. Here’s the issue.

I’ve stopped using AI detectors because I don’t trust them. I’ve read a bunch of reports that say the detectors are giving false positives, which means they falsely identify a person’s writing as being AI-generated. While this is concerning in itself, more concerning is the fact that the tools more often flag writing from non-native English writers as being AI-generated. Take this study from researchers at Stanford University. The researchers collected 91 human-written essays from a Test of English as a Foreign Language (TOEFL) repository. They report:

“(The detectors) incorrectly labeled more than half of the TOEFL essays as “AI- generated” (average false-positive rate: 61.3%). All detectors unanimously identified 19.8% of the human-written TOEFL essays as AI authored, and at least one detector flagged 97.8% of TOEFL essays as AI generated. Upon closer inspection, the unanimously identified TOEFL essays exhibited significantly lower text perplexity. ” (Liang et al, 2023, pg. 1).

In response to this, I’ve joined other institutions (like Vanderbilt University) to advise my colleagues to refrain from using AI detectors to examine student writing for AI-generated text. I’ve also added disclaimers on any previous posts which suggests that AI detectors can be used in this way. It’s a deviation from my past practices and my core principles of this blog, but I believe it is warranted.

The next logical question that some folks may have is “What do I do if I suspect a student is presenting AI-generated text as their work? If I can’t use an AI detector, what can I do?”

I’ll dig into that topic in next week’s post.

References:
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7).

Bias and Generative AI

Over the last eighteen months, I’ve written about my experiences and perspectives with generative artificial intelligence (GenAI) on this blog. I’ve reflected on my efforts using GenAI with my students and with my work. Throughout that time, I’ve mostly taken a cautiously optimistic stance with GenAI tools like ChatGPT. This week, I thought I’d spend some time outlining one of my biggest reservations with these tools: the bias inherent in the design and use of GenAI tools.

Just to be clear at the start of this discussion, I don’t know if anyone is intentionally designing or using these tools to be biased. Maybe there’s some evil mad computer scientist out there creating GenAI tools to espouse some heinous stuff, but I doubt it. That’s also not the focus of this post or what I believe. I think these GenAI tools are built on human-collected data. The tools function through programming that humans have written. Humans also interpret the responses that GenAI tools create. That’s a lot of human influence and interaction. And since humans are complex and complicated beings with our own perspectives, opinions, and experiences, we create a lot of messiness with the design and use of GenAI tools. One way this messiness shows itself is in the biased responses that GenAi sometimes generates. (The human messiness can also cause GenAI to just make up information and draw false conclusions, but those are topics for different posts.)

Over the last few months, I’ve been collecting articles about the biased nature of GenAI. I’ve been sharing these with colleagues and with my students, and weaving the content in professional development sessions and lessons I’ve been facilitating. While there are a bunch of examples out there, I wanted to highlight a few that I find particularly troubling.  Take the article titled Humans are biased. Generative AI is even worse which Bloomberg published a few months ago. In the article, the authors detailed research on an image-creating GenAI tool called Stable Diffusion. They asked the tool to generate thousands of images related to different job titles and crime and the results were unsettling. When images of different job titles were examined based on skin tones, “image sets generated for every high-paying job were dominated by subjects with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like ‘fast-food worker’ and ‘social worker’.” When examining images of different job titles based on perceived gender, the researchers found that “most occupations in the dataset were dominated by men, except for low-paying jobs like housekeeper and cashier.” I encourage you to read the article. Besides the data and cool graphics, the information shared is both insightful and extremely troubling.

A few of the other studies I have shared were conducted by the human relations company, Textio. In a series of studies, Kieran Snyder examined the responses that ChatGPT created for different job-related prompts. In one study, Snyder asked ChatGPT to offer performance feedback for different job titles. In the responses, ChatGPT often relied on gender stereotypes when choosing pronouns. For example, ChatGPT-generated performance reviews for Kindergarten teachers always used the pronoun “she.” ChatGPT-generated performance reviews for construction workers always used the pronoun “he.” In another study, Snyder examined 200 performance reviews that ChatGPT generated based on the following prompts:

  • “Write constructive performance feedback for a marketer who studied at Howard who has had a rough first year”
  • “Write constructive performance feedback for a marketer who studied at Harvard who has had a rough first year”

Again, the results are troubling. The AI-generated reviews for graduates from Howard University (an HBCU) often included phrases like “missing technical skills” and “doesn’t get along with others.” The ones for Harvard often included phrases like “micromanages others” and “lacks creativity.” Clearly, the programming, data collection, and analysis that ChatGPT relies upon is leading it to generate biased responses.

There are other studies and articles I could include in this post, but I’m sure you get the idea. Despite all of this information, though, I want to reiterate my “cautiously optimistic” stance with GenAI. These tools still have value, regardless of the biased and problematic responses they create. If anything, they offer a mirror to the larger prejudices and misconceptions in our society and give educators new opportunities to teach critical literacy. And that’s the biggest takeaway for me. People are using these GenAI tools in all sorts of creative ways. If we don’t teach our students how to critically analyze the responses they receive from GenAI tools, who will?

More Feedback with GenAI

If you’ve been reading this blog for a bit, you may remember that I’ve been experimenting with using generative artificial intelligence (GenAI) to support my students’ writing. Last fall, I wrote several posts on my efforts using ChatGPT to provide feedback on drafts that my students wrote. My goal was to use GenAI to provide some immediate corrective feedback to students on things like grammar, spelling, and formatting which would allow me to focus on higher-order feedback. If you’re interested in that journey, check out these blog posts from the fall:

Feedback with ChatGPT
Discussing Feedback with ChatGPT
Responding to ChatGPT Feedback

My graduate students submitted their first writing assignments last week and I incorporated GenAI into the writing and revision process again. This time, I chose to use Claude to provide feedback. Claude allows users to upload documents the the tool can then interact with. Last fall, my students had to copy and paste segments of their paper into ChatGPT to get feedback. With Claude, the students could upload an entire PDF of their paper and have the tool interact with the whole document. ChatGPT (and other GenAI tools) offer file upload functionality in their pay versions, but I didn’t want my students to pay for a tool. So, after playing around with Claude, I decided to implement the tool in this iteration of the activity.

To provide some support and guidance with using Claude, I offered the following prompts (and recorded a short video demonstrating the process):

  • Acting as a writing tutor, can you offer suggestions for improving the structure and organization of this paper without revising it?
  • Acting as a writing tutor, can you offer suggestions for improving the grammar and word usage of this paper without revising it?
  • Acting as a writing tutor, can you offer suggestions for improving the in-text citations of this paper without revising it? Please follow the APA 7th Edition guidelines.
  • Acting as a writing tutor, can you offer suggestions for improving the references of this paper without revising it? Please follow the APA 7th Edition guidelines.
  • Acting as a writing tutor, can you offer suggestions for improving the paper formatting of this paper without revising it? Please follow the APA 7th Edition guidelines for a student paper.

You may notice that the prompts have become more specific than the one I offered to students last fall. After my students submitted their revised drafts last semester, I analyzed their common mistakes and the types of feedback that ChatGPT gave them. I realized that the prompts needed to be more specific to better address the errors my students were making. To chart their revision process, I had students submit their first draft, a copy of Claude’s feedback, and their revised drafts. This gave me a window into the overall feedback process and how it informed my students’ work.

Here are some takeaways from this iteration of GenAI feedback:

  1. Claude mostly gave indirect feedback. Unless prompted for more specifics, Claude tended to give indirect feedback on students’ writing. For example, for one student’s paper, Claude provided this feedback: “Review verb tenses. Some sentences shift between present and past tense. Using past tense consistently for summarizing and discussing studies is appropriate.” This type of indirect feedback is helpful to students, but it requires that students can recognize the difference between verb tenses.  Students tend to respond better to direct feedback, where specific errors are identified. Looking at the Claude feedback that my students submitted, some asked follow-up questions seeking specific examples of some of the issues. I’ll probably include more guidance with asking follow-up questions next time.
  2. Claude made some mistakes. This shouldn’t be a huge surprise, but Claude didn’t always get it right, especially with APA formatting. The most common mistake that Claude made was with the number of authors to cite for a work with three or more authors. While the reference includes all of the names of the authors (up to 20), the citation only includes the first author, followed by an “et al.” Claude consistently bungled this.
  3. My students sought and appreciated the feedback. Despite the mistakes that Claude made and the type of feedback it offered, my students responded favorably to the activity. While I suggested that students could use whichever prompts they felt they needed, most of them used all five prompts to receive feedback on their papers. I also surveyed the students after this exercise and asked how likely they were to choose to use an AI tool again to guide their writing and revisions. Almost all of them said they planned to use GenAI to support their work. For example, one student wrote, “I would likely use an AI tool again to guide my writing and revisions. I feel that, as long as I double check that its recommendations align with formatting guidelines, it doesn’t really hurt to have another set of (artificial) eyes look over my paper.” It’s clear that students saw the benefits and also recognized the limitations of AI-generated feedback, at least concerning their own learning.
  4. Student use did not always translate to their practice. All of the students in this graduate class are teachers. While they work in different settings and environments, they all support their own students’ learning. When asked whether they planned to use GenAI with their students, half of the respondents did not foresee using it. As one student wrote, “I do not think that I would use an AI tool in my classroom. To be frank, I’m afraid that students would use this technology to do the work for them, and not to just assist them.” While my students saw the benefits to their own learning, they weren’t ready to incorporate GenAI tools into the classroom.

Let’s get phygital!

This weekend, I attended the annual meeting of the American Association of Colleges for Teacher Education (AACTE) that was being held in Denver, CO. During the conference, I attended a bunch of sessions and panel discussions and got to co-present a session with a colleague. I usually come back from conferences with my head thinking about new ideas or concepts and this weekend was no different. I anticipate that I’ll be spending the next post or two unpacking different things I encountered at the conference, but this week, I decided to get “phygital.”

Even though Grammarly is trying to correct me every time I type it, phygital is a word. I heard the word for the first time this weekend and immediately started searching for what the word meant and how it applied to education. The best definition I could find referred to phygital as “a blend of physical and digital environments.” Educationally, this would mean that a phygital space would mix physical, face-to-face learning with some online components. To me, that sounded a lot like other terminology that educators have been using for a while. Blended learning. Hybrid learning. Is phygital just a rebranding of those terms? I didn’t know, so I went to one of my new favorite AI tools (Perplexity) for help. Perplexity acts a lot like ChatGPT but it cites its sources. This helps to reduce the hallucinations and misinformation that some of the other AI chatbots are prone to offer. So, I asked Perplexity, “What does phygital mean in the context of teaching and learning? How does it differ from blended or hybrid learning?” And here’s how Perplexity responded:

In the context of teaching and learning, ‘phygital’ refers to the integration of physical and digital elements to create a seamless learning experience. It involves tightly integrating digital activities into face-to-face sequences to enhance the learning process. Phygital learning aims to bridge the gap between traditional classroom settings and digital technology, allowing for a more engaging and effective learning environment.

Phygital learning differs from blended or hybrid learning in that it goes beyond simply combining traditional and digital learning experiences. While blended or hybrid learning involves a mix of in-person and online instruction, phygital learning specifically focuses on tightly integrating digital activities into face-to-face sequences to create a more dynamic and effective learning environment.”

While this description helped to clarify some things, I was still struggling to see how this was different from blended or hybrid learning. So, I asked a follow-up question. I asked, “Can you provide an example of a phygital learning activity in a science classroom?” Here’s how Perplexity responded:

An example of a phygital learning activity in a science classroom is the use of augmented reality (AR) simulations to conduct virtual science experiments. This approach allows students to engage in hands-on learning experiences by interacting with digital elements superimposed on the real world, such as observing and manipulating virtual chemical reactions or exploring complex scientific concepts through interactive AR applications. By integrating AR simulations into the science curriculum, students can gain a deeper understanding of scientific principles and phenomena while fostering sustained interest and engagement in the subject.”

This made it clearer. In most cases, blended (or hybrid) learning involves either interacting with face-to-face instruction OR online instruction, phygital suggests that we can create learning environments where those traditional distinctions no longer exist. Students can interact with the physical and the digital at the same time, with one enhancing and supporting the other. While I’m still wrapping my head around this new terminology, I think the term “phygital” offers new ways to describe some of the learning experiences on the horizon as the use of virtual reality, augmented reality, and artificial intelligence continues to expand.

References (from Perplexity):

Unlocking the Potential of Phygital Learning and Education
Phygital Learning – Ahead of the Curve
Phygital Learning: Bridging Physical and Digital Education

Concerns about AI

A few weeks ago, I presented at the Association for Middle Level Education conference with my colleagues, Kim McCollum-Clark and Jen Shettel. Our session focused on Harnessing the Power of Artificial Intelligence (AI) in the Middle-Level Classroom and was attended by almost 200 teachers, librarians, and administrators. While we thought the session would be well attended, we had more attendees than seats. It was really exiting to share our perspectives on ChatGPT and other generative AI tools to some many people. To be honest, it was also intimidating to present to a group of that size. From the feedback we’ve received, however, it seems that the session was a success.

We started the presentation by asking the attendees to write down their concerns about AI. After they shared their responses in small groups, we led a larger conversation around these concerns before leading into the content of the session. We hoped to address the concerns without having them be the focus of our presentation. At the end of the session, we collected the participants’ written responses and spent some time later that day looking through the concerns. We grouped their concerns into several categories. Looking at the categories, I decided they would make a great blog post.

Concern #1: Plagiarism and cheating
This was overwhelmingly the most popular response from the written concerns we collected. Dozens of participants wrote some variation of “How do I avoid my students using AI to cheat?” While this wasn’t an explicit focus of our session, it is clear that teachers are worried that students will use AI to do their work for them.

Concern #2: Academic impacts of AI
While this concern is somewhat related to the first one, it is more focused on the long-term impacts of AI use on student learning and ability. Some participants worried that AI would “stifle student authenticity” while others worried students would “not learn the writing process.” Another communicated that they worried their students would lose the opportunity to “engage in creative struggle.” Another participant wrote that students would lose “critical thinking and reasoning skills.” It was interesting to see how these attendees moved from short-term impacts (focusing on plagiarism of a single assignment) to considering more on the global and far-reaching impacts of AI use on student learning.

Concern #3: Teaching with AI
Without doing any exhaustive statistical analysis, this concern was probably the third most popular of the categories of concern we identified. Participants worried about how they could use AI in their classrooms. One participant wrote that they felt teachers were “not equipped with the knowledge and tools to be successful” while another was concerned about the “lack of access to explicit instruction for teachers wanting to embrace AI in their classroom.” Thankfully, this was the focus of our session and we provided several curricular examples for integrating AI into middle-level classrooms.

Concern #4: Responsible and ethical use
Beyond considering the academic impacts of AI use, some participants worried about how to help students “use AI responsibly” and “appropriately.” Another worried about how they could teach “the kids to use AI ethically and effectively.” While this category wasn’t as large as the others, it was clear that some attendees saw their roles as educating students not just on academic subjects, but as preparing moral beings who interact with others through the digital tools they use.