Asked  3 Months ago    Answers:  5   Viewed   410 times

Struggling for this for hours so I decided to ask for help from experts here:

I want to modify existing excel sheet without overwriting content. I have other sheets in this excel file and I don't want to impact other sheets.

I've created sample code, not sure how to add the second sheet that I want to keep though.

t=pd.date_range('2004-01-31', freq='M', periods=4)
first=pd.DataFrame({'A':[1,1,1,1],
             'B':[2,2,2,2]}, index=t)
first.index=first.index.strftime('%Y-%m-%d')
writer=pd.ExcelWriter('test.xlsx')
first.to_excel(writer, sheet_name='Here')
first.to_excel(writer, sheet_name='Keep')

#how to update the sheet'Here', cell A5:C6 with following without overwriting the rest?
#I want to keep the sheet "Keep"
update=pd.DataFrame({'A':[3,4],
                     'B':[4,5]}, index=pd.date_range('2004-04-30', 
                                                     periods=2,
                                                     freq='M'))

I've researched SO. But not sure how to write a dataframe into the sheet.

Example I've tried:

import openpyxl
xfile = openpyxl.load_workbook('test.xlsx')
sheet = xfile.get_sheet_by_name('test')
sheet['B5']='wrote!!'
xfile.save('test2.xlsx')

 Answers

98

Figured it out by myself:

#Prepare the excel we want to write to
t=pd.date_range('2004-01-31', freq='M', periods=4)
first=pd.DataFrame({'A':[1,1,1,1],
             'B':[2,2,2,2]}, index=t)
first.index=first.index.strftime('%Y-%m-%d')
writer=pd.ExcelWriter('test.xlsx')
first.to_excel(writer, sheet_name='Here')
first.to_excel(writer, sheet_name='Keep')

#read the existing sheets so that openpyxl won't create a new one later
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

#update without overwrites
update=pd.DataFrame({'A':[3,4],
                     'B':[4,5]}, index=(pd.date_range('2004-04-30', 
                                                     periods=2,
                                                     freq='M').strftime('%Y-%m-%d')))

update.to_excel(writer, "Here", startrow=1, startcol=2)

writer.save()
Monday, July 19, 2021
 
Noob_Programmer
answered 3 Months ago
12

Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()
Tuesday, June 1, 2021
 
Semirix
answered 5 Months ago
19

You can still use the ExcelFile class (and the sheet_names attribute):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

see docs for parse for more options...

Wednesday, June 9, 2021
 
Ticksy
answered 5 Months ago
10

Edits were not making an impact because the process was compiled into an exe that these modules were running through. Exported the sections I needed outside of my anaconda environment and now the process works without a hitch.

Saturday, August 7, 2021
 
mgierw
answered 3 Months ago
77

I'll address the second question first, as it's a more fundamental problem.

The other part of this is another script that has been running ok when it was the only script running, but has since stopped working when the other script were introduced.

In the script project attached to your sample, you have 3 files which each define an onEdit() function. This is problematic because each time you define onEdit() you're redefining the same identifier. The project only has a single global scope, so there can only be 1 onEdit() function defined, regardless of how many files your project contains.

Essentially, this is equivalent to what you've defined in your project:

function onEdit(e) {
  console.log("onEdit #1");
}

function onEdit(e) {
  console.log("onEdit #2");
}

function onEdit(e) {
  console.log("onEdit #3");
}

onEdit();

Running the above snippet will only execute the last definition of onEdit().

To accomplish what you're trying to do, you can instead define unique functions for all the actions you want to perform and then, in a single onEdit() definition, you can call those functions. Something like:

function editAction1(e) {
  console.log("edit action #1");
}

function editAction2(e) {
  console.log("edit action #2");
}

function editAction3(e) {
  console.log("edit action #3");
}

function onEdit(e) {
  editAction1(e);
  editAction2(e);
  editAction3(e);
}

onEdit();

When defining an onEdit() trigger, you really want to optimize it so that it can complete its execution as quickly as possible. From the Apps Script best practices, you want to pay particular attention to "Minimize calls to other services" and "Use batch operations".

A few specific tips for you:

  • Avoid repeated calls to the same Apps Script API (e.g. Sheet.getName()). Instead, run it once and store the value in local variable.
  • As much as possible, avoid making Apps Script API calls within loops and in callback functions passed to methods such as Array.prototype.filter() and Array.prototype.map().
  • When you do need to loop through data, especially when Apps Script API calls are involved, minimize the number of times you iterate through the data.
  • With onEdit() triggers, try to structure the logic so that you identify cases where you can exit early (similar to how you perform the column check before going ahead with manipulating checkboxes). I doubt you actually need to iterate through all of the sheets and update the "Open Action Items" formula for every single edit. If I'm interpreting the formula properly, it's something that should only be done when sheets are added or removed.

Finally, to address the blank rows in your formula output, instead of using SORT() to group the blank rows you can use QUERY() to actually filter them out.

Something like:

=QUERY({ <...array contents...> }, "select * where Col1 is not null")

Note that when using QUERY() you need to be careful that the input data is consistent in regards to type. From the documentation (emphasis mine):

In case of mixed data types in a single column, the majority data type determines the data type of the column for query purposes. Minority data types are considered null values.

In your sample sheet, a lot of the example data varies and doesn't match what you'd actually expect to see (e.g. "dghdgh" as a value in a column meant for dates). This is important given the warning above... when you have mixed data types for a given column (i.e. numbers and strings) whichever type is least prevalent will silently be considered null.

After taking a closer, end-to-end look at your sample, I noticed you're performing a very convoluted series of transformations (e.g. in the data sheets there's the hidden "D" column, the QUERY() columns to the right of the actual data, etc.). This all culminates in a large set of parallel QUERY() calls that you're generating via your onEdit() implementation.

This can all be made so much simpler. Here's a pass at simplifying the Apps Script code, which is dependent on also cleaning up the spreadsheet that it's attached to.

function onEdit(e) {
  /*
  Both onEdit actions are specific to a subset of the sheets. This
  regular expression is passed to both functions to facilitate only
  dealing with the desired sheets.
  */
  const validSheetPattern = /^E[0-9]+/;
  
  updateCheckboxes(e, validSheetPattern);
  updateActionItems(e, validSheetPattern);
}

function updateCheckboxes(e, validSheetPattern) {
  const sheet = e.range.getSheet();

  // Return immediately if the checkbox manipulation is unnecessary.
  if (!validSheetPattern.exec(sheet.getName())) return;
  if (e.range.getColumn() != 2) return;

  const needsCheckbox = ["Tech Note", "Intake Process"];
  const checkboxCell = sheet.getRange(e.range.getRow(), 3);
  if (needsCheckbox.includes(e.value)) {
      checkboxCell.insertCheckboxes();
  } else {
      checkboxCell.removeCheckboxes();
  }
}

function updateActionItems(e, validSheetPattern) { 
  const masterSheetName = "Open Action Items";
  const dataLocation = "A3:E";

  /*
  Track the data you need for generating formauls in an array
  of objects. Adding new formulas should be as simple as adding
  another object here, as opposed to duplicating the logic
  below with a growing set of manually indexed variable names
  (e.g. cell1/cell2/cell3, range1/range2/range3, etc.).
  */
  const formulas = [
    {
      location: "A3",
      code: "Tech Note",
    },
    {
      location: "E3",
      code: "Intake Process",
    },
  ];
  
  const masterSheet = e.source.getSheetByName(masterSheetName);
  const sheets = e.source.getSheets();
  
  /*
  Instead of building an array of QUERY() calls, build an array of data ranges that
  can be used in a single QUERY() call.
  */
  let dataRangeParts = [];
  for (const sheet of sheets) {
    // Only call getSheetName() once, instead of multiple times throughout the loop.
    const name = sheet.getSheetName();

    // Skip this iteration of the loop if we're not dealing with a data sheet.
    if (!validSheetPattern.exec(name)) continue;

    dataRangeParts.push(`'${name}'!${dataLocation}`);
  }
  const dataRange = dataRangeParts.join(";");
    
  for (const formula of formulas) {
    /*
    And instead of doing a bunch of intermediate transformations within the sheet,
    just query the data directly in this generated query.
    */
    const query = `SELECT Col5,Col1,Col4 WHERE Col2='${formula.code}' AND Col3=FALSE`;
    const formulaText = `IFERROR(QUERY({${dataRange}},"${query}"),{"","",""})`;
    
    formula.cell = masterSheet.getRange(formula.location);
    formula.cell.setFormula(formulaText);
  }
}

Here's a modified sample spreadsheet that you can reference.

The one concession I made is that the data sheets still have a "Site Code" column, which is automatically populated via a formula. Having all the data in the range(s) you feed into QUERY() makes the overall formulas for the "Open Action Items" sheet much simpler.

Friday, September 3, 2021
 
Packy
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share